231 research outputs found
Exploring Communities in Large Profiled Graphs
Given a graph and a vertex , the community search (CS) problem
aims to efficiently find a subgraph of whose vertices are closely related
to . Communities are prevalent in social and biological networks, and can be
used in product advertisement and social event recommendation. In this paper,
we study profiled community search (PCS), where CS is performed on a profiled
graph. This is a graph in which each vertex has labels arranged in a
hierarchical manner. Extensive experiments show that PCS can identify
communities with themes that are common to their vertices, and is more
effective than existing CS approaches. As a naive solution for PCS is highly
expensive, we have also developed a tree index, which facilitate efficient and
online solutions for PCS
Decouple knowledge from paramters for plug-and-play language modeling
Pre-trained language models(PLM) have made impressive results in various NLP
tasks. It has been revealed that one of the key factors to their success is the
parameters of these models implicitly learn all kinds of knowledge during
pre-training. However, encoding knowledge implicitly in the model parameters
has two fundamental drawbacks. First, the knowledge is neither editable nor
scalable once the model is trained, which is especially problematic in that
knowledge is consistently evolving. Second, it lacks interpretability and
prevents humans from understanding which knowledge PLM requires for a certain
problem. In this paper, we introduce PlugLM, a pre-training model with
differentiable plug-in memory(DPM). The key intuition is to decouple the
knowledge storage from model parameters with an editable and scalable key-value
memory and leverage knowledge in an explainable manner by knowledge retrieval
in the DPM. To justify this design choice, we conduct evaluations in three
settings including: (1) domain adaptation. PlugLM obtains 3.95 F1 improvements
across four domains on average without any in-domain pre-training. (2)
knowledge update. PlugLM could absorb new knowledge in a training-free way
after pre-training is done. (3) in-task knowledge learning. PlugLM could be
further improved by incorporating training samples into DPM with knowledge
prompting.Comment: ACL2023 Finding
Measuring and Relieving the Over-smoothing Problem for Graph Neural Networks from the Topological View
Graph Neural Networks (GNNs) have achieved promising performance on a wide
range of graph-based tasks. Despite their success, one severe limitation of
GNNs is the over-smoothing issue (indistinguishable representations of nodes in
different classes). In this work, we present a systematic and quantitative
study on the over-smoothing issue of GNNs. First, we introduce two quantitative
metrics, MAD and MADGap, to measure the smoothness and over-smoothness of the
graph nodes representations, respectively. Then, we verify that smoothing is
the nature of GNNs and the critical factor leading to over-smoothness is the
low information-to-noise ratio of the message received by the nodes, which is
partially determined by the graph topology. Finally, we propose two methods to
alleviate the over-smoothing issue from the topological view: (1) MADReg which
adds a MADGap-based regularizer to the training objective;(2) AdaGraph which
optimizes the graph topology based on the model predictions. Extensive
experiments on 7 widely-used graph datasets with 10 typical GNN models show
that the two proposed methods are effective for relieving the over-smoothing
issue, thus improving the performance of various GNN models.Comment: Accepted by AAAI 2020. This complete version contains the appendi
Stochastic Bridges as Effective Regularizers for Parameter-Efficient Tuning
Parameter-efficient tuning methods (PETs) have achieved promising results in
tuning large pre-trained language models (PLMs). By formalizing frozen PLMs and
additional tunable parameters as systems and controls respectively, PETs can be
theoretically grounded to optimal control and further viewed as optimizing the
terminal cost and running cost in the optimal control literature. Despite the
elegance of this theoretical grounding, in practice, existing PETs often ignore
the running cost and only optimize the terminal cost, i.e., focus on optimizing
the loss function of the output state, regardless of the running cost that
depends on the intermediate states. Since it is non-trivial to directly model
the intermediate states and design a running cost function, we propose to use
latent stochastic bridges to regularize the intermediate states and use the
regularization as the running cost of PETs. As the first work to propose
regularized PETs that use stochastic bridges as the regularizers (running
costs) for the intermediate states, we show the effectiveness and generality of
this regularization across different tasks, PLMs and PETs. In view of the great
potential and capacity, we believe more sophisticated regularizers can be
designed for PETs and better performance can be achieved in the future. The
code is released at
\url{https://github.com/thunlp/stochastic-bridge-pet/tree/main}.Comment: ACL 2023 Finding
Systematic Analysis of Impact of Sampling Regions and Storage Methods on Fecal Gut Microbiome and Metabolome Profiles.
The contribution of human gastrointestinal (GI) microbiota and metabolites to host health has recently become much clearer. However, many confounding factors can influence the accuracy of gut microbiome and metabolome studies, resulting in inconsistencies in published results. In this study, we systematically investigated the effects of fecal sampling regions and storage and retrieval conditions on gut microbiome and metabolite profiles from three healthy children. Our analysis indicated that compared to homogenized and snap-frozen samples (standard control [SC]), different sampling regions did not affect microbial community alpha diversity, while a total of 22 of 176 identified metabolites varied significantly across different sampling regions. In contrast, storage conditions significantly influenced the microbiome and metabolome. Short-term room temperature storage had a minimal effect on the microbiome and metabolome profiles. Sample storage in RNALater showed a significant level of variation in both microbiome and metabolome profiles, independent of the storage or retrieval conditions. The effect of RNALater on the metabolome was stronger than the effect on the microbiome, and individual variability between study participants outweighed the effect of RNALater on the microbiome. We conclude that homogenizing stool samples was critical for metabolomic analysis but not necessary for microbiome analysis. Short-term room temperature storage had a minimal effect on the microbiome and metabolome profiles and is recommended for short-term fecal sample storage. In addition, our study indicates that the use of RNALater as a storage medium of stool samples for microbial and metabolomic analyses is not recommended.IMPORTANCE The gastrointestinal microbiome and metabolome can provide a new angle to understand the development of health and disease. Stool samples are most frequently used for large-scale cohort studies. Standardized procedures for stool sample handling and storage can be a determining factor for performing microbiome or metabolome studies. In this study, we focused on the effects of stool sampling regions and stool sample storage conditions on variations in the gut microbiome composition and metabolome profile
Infection and Infertility
Infection is a multifactorial process, which can be induced by a virus, bacterium, or parasite. It may cause many diseases, including obesity, cancer, and infertility. In this chapter, we focus our attention on the association of infection and fertility alteration. Numerous studies have suggested that genetic polymorphisms influencing infection are associated with infertility. So we also review the genetic influence on infection and risk of infertility
- …